Locating Discontinuities in Synthetic Speech using a Perceptually Orientated Approach

نویسندگان

  • Joseph Timoney
  • Rudi Villing
  • Tomas Ward
چکیده

A significant problem with unit selection based speech synthesis is the listener perception of sound discontinuities at which the speech waveforms are joined. This work demonstrates the application of three different perceptually motivated timefrequency representations and associated measures to the identification of such discontinuities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perceptually-based Data-driven Join Co

Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...

متن کامل

Perceptually-based data-driven join costs: comparing join types

Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...

متن کامل

Feature transformation applied to the detection of discontinuities in concatenated speech

The quality of concatenated speech depends on the degree of mismatch between successive units. Defining a perceptually salient join cost to represent the degree of mismatch has proven to be a difficult task. Such a join cost is critical in unit selection synthesis to ensure that the optimum sequence of speech units is selected from the units available in the speech inventory. In this study the ...

متن کامل

Automatic Segmentation Combining and Spectral Boundary

Currently, AT&T Labs’ Natural Voices multilingual TTS system produces high-quality synthetic speech with a largescale speech corpus [1]. In the development of such systems, automatic segmentation constitutes a major component technology. The prevalent approach for automatic segmentation in speech synthesis is Hidden Markov Model (HMM) based. Even though an HMM-based approach is the most automat...

متن کامل

Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis

The quality of current commercial speech synthesis systems is now so high that system improvements are being made at subtle suband supra-segmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of perceptual attention to specific acoustic characteristics or cues. However, it is not well understood what acoustic cues listeners attend to by d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009